Learning from Video
ロボット基盤モデル
ロボットシステム
映像基盤モデル
EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark
https://arxiv.org/abs/2510.06218
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
https://arxiv.org/abs/2412.04445
HumanoidExo: Scalable Whole-Body Humanoid Manipulation via Wearable Exoskeleton
https://arxiv.org/pdf/2510.03022
video mimic
https://www.videomimic.net/
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
https://openaccess.thecvf.com/content_ECCV_2018/papers/Dima_Damen_Scaling_Egocentric_Vision_ECCV_2018_paper.pdf?utm_source=chatgpt.com
Challenges and Trends in Egocentric Vision: A Survey
https://arxiv.org/html/2503.15275v1?utm_source=chatgpt.com
Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
https://arxiv.org/html/2508.04681v1?utm_source=chatgpt.com
One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning
https://www.roboticsproceedings.org/rss14/p02.pdf?utm_source=chatgpt.com
EgoMimic: Scaling Imitation Learning via Egocentric Video
https://openreview.net/pdf/da5952e56d3c5b2704518851708c6a97e0a43d28.pdf
Automatic Generation of Two-Level Hierarchical Tutorials from Instructional Makeup Videos
https://hci.stanford.edu/publications/2021/truong_auto/truong_2021.pdf
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
https://openaccess.thecvf.com/content/CVPR2025/papers/Soucek_ShowHowTo_Generating_Scene-Conditioned_Step-by-Step_Visual_Instructions_CVPR_2025_paper.pdf
Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
https://openaccess.thecvf.com/content/CVPR2023/papers/Zhang_Aligning_Step-by-Step_Instructional_Diagrams_to_Video_Demonstrations_CVPR_2023_paper.pdf
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
https://openaccess.thecvf.com/content_ICCV_2019/papers/Miech_HowTo100M_Learning_a_Text-Video_Embedding_by_Watching_Hundred_Million_Narrated_ICCV_2019_paper.pdf
Multimodal Language Models for Domain-Specific Procedural Video Summarization
https://scispace.com/pdf/multimodal-language-models-for-domain-specific-procedural-1to3soi699.pdf
Screencast Tutorial Video Understanding
https://openaccess.thecvf.com/content_CVPR_2020/papers/Li_Screencast_Tutorial_Video_Understanding_CVPR_2020_paper.pdf
Learning To Recognize Procedural Activities with Distant Supervision
https://openaccess.thecvf.com/content/CVPR2022/papers/Lin_Learning_To_Recognize_Procedural_Activities_With_Distant_Supervision_CVPR_2022_paper.pdf
A comprehensive survey of procedural video datasets
https://www.sciencedirect.com/science/article/abs/pii/S1077314220301314